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ABSTRACT 

The  time  and  expense  required  to  certify  the  security  of  a 
multi-level  multi-access  computer  frequently  make  it  necessary 
to  process  classified  data  in  a dedicated  mode.  This  report  dis- 
cusses the  feasibility  of  a limited  type  of  classified  data  process- 
ing in  a multi-access  mode. 

INTRODUCTION 

Privacy  transformations  have  been  proposed  as  a countermeasure  to  several  types  of 
threats  to  the  security  of  information  stored  in  a large,  multi-access  computer  system.  Turn' 
summarizes  some  of  the  problems  associated  with  the  use  of  transformed  data  bases  including 
distribution  of  keys,  selective  updating,  and  change  of  key,  among  others.  Turn’s  view  of  the 
utility  of  privacy  transformations  for  data  banks  seems  to  be  oriented  towards  the  problem  of 
protecting  the  contents  of  physically  removable  data  storage  units.  Feistel  et  al.^  have  sug- 
gested that  on-line  direct-access  storage  devices  can  be  protected  if  file-access  methods  can  be 
devised  to  deal  directly  with  cryptograms.  No  results  have  been  published  liowever.  Bayer 
and  Metzger^  have  studied  the  problem  of  protecting  information  stored  in  indexed,  random 
access  files  by  means  of  privacy  transformations.  Tliey  also  consider  threats  to  the  security 
of  a transformed  file  due  to  updating  and  to  the  file  structure  itself.  Based  on  tliese  findings, 
an  approach  is  proposed  which  shifts  the  primary  responsbility  for  protecting  the  information 
from  the  central  facility  to  an  intelligent  terminal.  Tne  software  required  for  the  intelligent 
terminal  could  be  made  certifiable  using  available  methods. 

THE  ENVIRONMENT 

Tlie  information  processing  system  under  consideration  has  tlie  following  elements; 

• A central  processing  facility  under  control  ol  a commercially  available  operating  .system 
and  having  the  appropriate  mass  storage  lieirarchy. 

'j/  A collection  of  files  of  information  - some  of  which  are  available  for  gener;>!  use  while 
others  arc  available  only  with  proper  authorization. 

'lum,  R.,  “Privacy  Transformations  for  Databank  Systems."  Proceedings  of  the  National  Computer  Con- 
ference, pp.  589-601  (1973). 

^Feistel,  H.  el  al..  “Some  Cryptographic  Techniques  for  Machine-lo-Machinc  Data  Communication."  Proceed- 
ings of  the  IFEE,  Vol.  63.  No.  1 1,  pp.  I54.S-15.54  (Nov  1975). 

^Baycr,  R.  and  J.  Metzger,  “On  Encipherment  of  Search  Trees  and  Random  Access  Files."  ACM  Transactions 
on  Database  Systems,  Vol.  1,  No.  1,  pp.  37-52  (Mar  1976). 
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• A communications  network  consisting  of  common  carrier,  leased,  or  private  lines. 

• A geographically  distributed  set  of  query  terminals  - some  of  which  are  in  areas  to 
which  the  public  has  access  while  others  are  in  areas  which  are  secure. 

Tlie  system  considered  in  this  report  is  used  for  query/transaction  processing.  Transactions 
originate  at  the  terminals  and  are  transmitted  over  the  network  to  the  central  processing  facil- 
ity. The  appropriate  files  are  accessed,  the  transaction  is  processed,  and  the  transaction  result 
is  transmitted  over  the  communications  network  to  the  requesting  terminal  where  additional 
processing  may  take  place. 


MODE  OF  OPERATION 

Protection  of  information  stored  in  a system  such  as  that  described  above  is  an  extremely 
difficult  problem  which  has  received  considerable  attention.  Entirely  satisfactory  solutions 
have  not  been  obtained.  A practical  approach  to  the  problem  which  utilizes  currently  avail- 
able technology,  affords  a high  degree  of  protection  for  a somewhat  limited  application,  and 
which  is  reasonable  in  cost,  is  outlined  below. 

Consider  a group  of  analysts  who  in  the  course  of  their  work  require  conversational  ac- 
cess to  several  files,  one  of  them  large  (10^  entries)  and  highly  sensitive.  The  analysts  per- 
form retrievals,  deletions,  and  updates  on  the  files.  Other  groups  of  analysts  may  require 
access  to  only  the  non-sensitive  files.  In  the  proposed  approach,  all  of  the  files  would  be 
stored  in  the  on-line  direct  access  memory  of  the  central  facility.  The  file  containing  sensitive 
information  would  be  protected  by  storing  it  in  a suitably  enciphered  form.  Transactions 
(retrievals,  deletions,  updates)  involving  the  sensitive  file  would  be  performed  by  analysts  using 
intelligent  terminals  located  in  secure  areas.  The  intelligent  terminals  would  perform  the  de- 
ciphering and  enciphering  necessary  to  process  the  transactions.  The  details  of  the  approach 
are  given  in  the  next  section. 

IMPLEMENTATION 

In  order  to  provide  a reasonable  response  time  for  transactions  processed  at  the  intelligent 
terminal,  a suitable  method  for  organizing  the  file  for  remote  access  must  be  selected.  Bayer 
and  McCreight^  have  studied  the  problem  of  organizing  and  maintianing  an  index  for  a dy- 
namically changing  random  access  file.  In  their  terminology,  an  index  is  a collection  of  index 


^Bayer,  R.  and  E.  McCreight.  “Organization  and  Maintenance  of  Large  Ordered  Indices,"  Acta  Informatica, 
Vol.  1,  No.  3,  pp.  173-189  (1972). 
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elements  which  are  pairs  (x,  a)  of  data  items  stored  together,  where  x is  a name  or  identifier 
and  a is  (usually)  a pointer  into  a file  of  associated  information.  It  is  assumed  that  the  index 
may  be  quite  large,  say  10^  elements,  and  that  it  is  stored  on  direct  access  devices  such  as 
discs  or  drums.  Bayer  and  McCreight  have  devised  a method  for  organizing  the  index  which 
allows  insertion,  retrieval,  and  deletion  of  elements  in  the  index  in  an  amount  of  time  propor- 
tional to  log^  N where  N is  tlie  number  of  elements  in  the  index  and  k is  a natural  number 
describing  the  page  size  most  suitable  for  the  direct  access  device  being  used.  It  is  shown  that 
for  a page  size  of  120  index  elements  and  an  index  containing  between  4.5  X 10*  and  2.1  X 
10*  elements,  an  index  element  can  be  retrieved  from  the  random  access  index  file  in  no  more 
than  four  reads. 

In  the  scheme  described  by  Bayer  and  McCreight,  the  index  is  organized  into  pages  con- 
taining bet\^  een  k and  2k  index  elements  which  are  stored  sequentially  in  increasing  order. 

The  pages  are  the  nodes  of  a recently  defined  type  of  tree  called  a B-tree  which  is  defined  as 
follows; 

Let  h be  a non-negative  integer,  and  k a positive  integer.  A directed  tree  T is  in  the 
class  t(k,  h)  of  B-trees  if  T is  either  empty  (h  = 0)  or  has  the  following  properties: 

• Each  path  from  the  root  to  any  leaf  has  the  same  length  h,  which  is  called  the  height 
of  the  tree;  h is  the  number  of  nodes  in  the  path. 

• Each  node  except  the  root  and  the  leaves  has  at  least  k + 1 sons.  The  root  is  a leaf 
or  has  at  least  two  sons. 

• Each  node  has  at  most  2k  + 1 sons. 

It  is  proposed  that  the  index  be  stored  in  the  central  facility  in  enciphered  form  in  pages 
of  size  2k  and  accessed  by  a standard  direct  access  method  such  as  BDAM  in  OS/360.  The 
role  of  the  remote  intelligent  terminal  is  to  accept  the  transaction  from  the  analyst  and  per- 
form the  necessary  retrieval,  deciphering,  enciphering,  insertion,  and  deletion  operations  on 
the  index  file.  The  direct  access  method  on  the  central  facility  is  used  only  to  manipulate 
encrypted  pages. 

As  noted  by  Anderson,*  the  technique  selected  to  encipher  the  index  file  is  of  vital  im- 
portance since  the  structure  of  the  file  provides  valuable  clues  to  the  cryptanalyst.  It  is  pro- 
posed that  the  file  be  enciphered  with  a block  cipher  using  a different  key  for  each  page. 
Bayer  and  Metzger*  describe  a scheme  which  seems  to  satisfy  Anderson’s  requirements.  Tlieir 
scheme  is  summarized  briefly  in  the  following  paragraph. 


*Anderson.  J.,  “Information  Security  in  a Multi-User  Computer  Environment,”  Advances  in  Computers. 
M.  Rubinoff,  Editor,  New  York  (1972). 


3 


Consider  a File  F which  is  stored  on  secondary  memory  in  m paees.  Let  each  page  have 
an  associated  page  number  p to  locate  it  and  a page  ID  p.  Tlie  plain  and  cipher  text  versions 
of  the  i***  page  arc  denoted  by  Q(Pj)  and  C(Pj)  respectively.  Two  ciphers  are  used  for  file 
encrypting,  a text  cipher  U (which  must  be  reversible)  and  a page  key  cipher  E (which  need 
not  be  reversible).  If  is  the  key  for  E,  the.  given  the  page  ID  p^  for  p^,  E is  used  to  cal- 
culate the  corresponding  page  key  k„  by  k„  = E(p.,  Kp).  The  page  contents  are  then  en- 

P I P j It- 

crypted  by  C(Pj)  = U(Q(Pj),  kp  ) and  decrypted  by  0(p,)  = U"'  (C(Pj).  kp  ).  Bayer  and 
Metzger  suggested  assigning  arbitrary  values  to  the  p^  and  storing  them  in  a table,  ordered  by 
page  number.  In  our  proposal,  the  page  ID  could  be  generated  from  the  page  number  using  a 
suitable  cipher  and  a separate  key  if  desired.  In  this  approach,  each  user  of  a file  would  be 
required  to  know  two  keys,  K and  the  key  required  to  generate  tlie  page  ID’s. 

Tlie  algorithms  for  retrieval,  insertion,  and  deletion  of  nodes  on  the  B-tree  are  rather 
simple  and  are  discussed  in  Appendix  A.  If  h is  the  height  of  a given  B-trec,  f the  number  of 
pages  which  must  be  fetched  from  secondary  storage  for  a transaction,  and  w the  number  of 
pages  which  must  be  written,  then  the  number  of  fetches/writes  required  for  the  three  opera- 
tions in  the  worst  case  are  given  as  follows: 


Retrieve 

f = h 

O 

II 

Delete 

f - 2h  - 1 

w = h + I 

Insert 

f = 3 li  - 

II 

1 -J 
+ 

The  following  gives  the  size  range,  in  terms  of  elements,  of  index  files  wliicli  may  exist 
for  various  values  of  h and  a page  size  of  1 20  items. 


Height  of 
Page  Tree 

Minimum 
Index  Size 

Index  Size 

1 

1 

120 

2 

121 

14640 

3 

7441 

1771560 

4 

453961 

214358880 

5 

27691681 

2.59  X I0'° 
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DISCUSSION  OF  FEASIBILITY 


Two  aspects  of  the  feasibility  of  this  approach  are  discussed  - the  degree  of  protection 
afforded,  and  the  response  time. 

DEGREE  OF  PROTECTION 

• The  intelligent  terminal  can  be  placed  in  a secure,  shielded  enclosure. 

• Active  and  passive  wire-tapping  threats  can  be  countered  using  techniques  described  in 
Kent^  and  Feistel  et  al.^  Kent  presents  the  design  of  protocols  and  protection  mod- 
ules for  a host  and  intelligent  terminals.  Kent  claims  his  design  prevents  the  disclosure 
of  message  contents,  provides  detection  of  message  stream  modification,  and  provides 
detection  of  denial  of  message  service. 

• The  encryption  scheme  described  by  Bayer  and  Metzger  was  designed  for  structured 
files  and  includes  the  following  features: 

(a)  Each  block  of  information  in  the  file  is  encrypted  with  a different  key  thus 
countering  cryptanalysis  techniques  requiring  long  cipher  text  strings. 

(b)  If  a block  of  the  file  is  deciphered,  it  provides  the  cryptanalyst  with  little  assist- 
ance in  breaking  other  blocks.  Defenses  against  this  threat  are  discussed  at  length 
in  their  paper. 

• The  approach  described  does  not  rely  on  certification  of  the  central  system  software 
to  provide  data  protection  although  some  degree  of  certification  may  be  required  to 
provide  the  necessary  data  integrity.  The  approach  does  require  certification  of  the 
intelligent  terminal  software  which  is  a comparatively  simple  task. 

RESPONSE  TIME 

Bayer  and  McCreight^  provide  estimates  for  the  number  of  pages  from  an  index  file  which 
must  be  examined  to  perform  a simple  retrieval.  Based  on  their  example,  timing  estimates  for 
the  scheme  proposed  here  are  determined  by 

T = (T^  +T,  + Td)h 


*Kent,  S.,  “Encryption-Based  Protection  Products  for  Interactive  User-Computer  Communication."  MIT  Lab- 
oratory for  Computer  Science  Report  162  (May  1976). 
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where  Tin.o  required  to  access  a page  on  the  central  facility 

T,  Time  required  to  transmit  a page  to  the  intelligent  terminal 

Tj  Time  required  to  decipher  the  page  in  the  intelligent  terminal 

h Number  of  pages  to  be  fetched  and  searched. 

Assume  that  the  processing  time  for  the  retrieval  algorithm  itself  may  be  neglected.  Assume  a 
page  size  of  120  index  elements,  each  14  bytes.  With  this  page  size,  an  index  file  of  up  to 
2X10*  entries  can  be  stored  in  a tree  of  height  h = 4.  Let  us  assume  a value  of  100  ms  for 
Tj,.  To  calculate  T,,  assume  we  have  4800  bps  communication  lines.  Thus  T,  = 120  X 14  X 
8/4800  = 2.8  seconds. 

Assume  that  the  file  was  encrypted  using  the  National  Bureau  of  Standards  algorithm, 
and  that  50  ms  are  required  to  decipher  8 bytes  on  the  intelligent  terminal.  Assume  also  that 
each  page  is  scanned  using  a binary  search  technique  and  that  index  elements  are  deciphered 
only  when  necessary.  Tlien  T^  = (logj  120  - 1)  14  X 0.05/8  = 0.5  and  thus  T = (0.1  + 2.8 
+ 0.5)4  = 14.6  seconds.  If  deciphering  were  done  in  hardware  and  with  wider  bandwidth 
communications,  the  response  time  could  be  greatly  reduced,  even  if  a progressive  cipher  were 
used. 

SUMMARY 

Tliis  report  discusses  the  feasibility  of  performing  operations  on  data  stored  in  encrypted 
form  on  a central  facility.  It  has  shown  that  using  available  techniques,  transactions  on  en- 
crypted files  may  be  performed  with  a reasonable  response  time  and  witli  a high  degree  of 
protection  on  commercially  avaUable  systems.  It  is  envisioned  that  intelligent  terminals  used 
for  this  purpose  in  the  future  will  be  more  powerful  personal  computers,  and  that  the  role  of 
the  central  facility  will  become  that  of  a back-end  data  management  system. 


APPENDIX  A 

B-TREES  AND  THEIR  MANIPULATION 


Recall  that  the  pages  on  which  the  index  is  stored  are  the  nodes  of  a B-tree  and  that  they 
contain  up  to  2k  elements  where  an  element  consists  of  an  identifier  (x),  a pointer  (a)  to  a 
file  containing  associated  information,  and  a pointer  (p)  to  another  page  in  the  index.  The 
data  structure  of  the  index  also  has  the  following  properties: 

• Each  page  holds  between  k and  2k  elements  except  tor  the  root  page  which  may  hold 
between  1 and  2k  elements. 

• Let  the  number  of  elements  on  a page  P,  which  is  not  a leaf,  be  L.  Then  P has  L + 1 
sons. 

• Within  each  page  P,  the  elements  are  sequential  in  increasing  order;  Xj , x^ , . . .,  x^^ ; 
k < L < 2k  except  for  the  root  page  for  which  1 < L < 2k.  P also  contains  L + 1 
pointers  Pg,  Pj . • ■ Pl  the  sons  of  P. 

The  logical  structure  of  a page  is  shown  in  the  following  figure. 


Po 

x,a,p, 

— 

x^a^pj 

■ 

x,a,p, 

(Unused  Space) 

Figure  1 - Page  Stnicture 


, • Let  P(p, ) be  the  page  to  which  p,  points,  let  k(p, ) be  the  set  of  keys  on  the  pages  of 

that  maximal  subtree  of  which  PtPj ) is  the  root.  Tlien  for  these  B-trees  the  following 
conditions  hold: 

|Vy  ek  (Pg)l  (y  < X,) 

(Vy  €k  (Pj ) J (x,  < y < ) i = 1 , 2,  . . .,  L - 1 

[Vy  Gk  (Pl)1  (x,  < y) 

Now,  let  p,  r,  s be  pointer  variables  which  may  assume  the  value  “U”  meaning  undefined, 
r points  to  the  root  and  is  U if  the  tree  is  empty,  and  let  y be  the  key.  Let  P(p)  be  the  page 
; to  which  p is  pointing,  then  x, , . . .,  x,  are  the  keys  in  P(p)  and  Pg,  . . .,  p,  are  the  page 

' pointers  in  Pfp). 

i Figure  2 is  a flow  chart  of  the  algorithm  for  retrieving  an  identifier.  In  actual  practice, 

the  sequential  searches  within  a node  would  be  replaced  by  a binary  search. 

Figure  3 is  a flow  chart  of  the  insertion  algorithm.  Tlie  split  page  routine  performs  the 
following: 

i 
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• Insert  the  entry  into  the  sequence  of  entries  in  P (in  main  store)  resulting  in 
Pq  » (^l>  Pj)*  (^2  * P2  ' * ’’  ^^2k+l  * ^2k+l 

• Put  the  subsequent  p^,  (Xj,  p,) (x^,  p^)  into  P and  introduce  a new  page  P'  to 

contain  the  subsequent  p,^^j,  (Xj^+j,  Pk+2)>  • • •>  ^’^2k+i'  P2k+i^-  ^ father 

page  of  P.  Insert  the  entry  (x^^,,  p')  where  p'  points  to  P'  into  Q.  P'  then  becomes 
a brother  of  P.  These  pages  are  appropriately  encrypted  and  transmitted  to  the  cen- 
tral facility. 

The  deletion  algorithm  is  shown  in  Figure  4. 


ISTARTi 


ANALYST  TYPES 
IN  KEY  V 


APPLY  RETRIEVAL 
[ ALGORITHM  FOR 
KEY  V 


y FOUND? 


V IN 

INDEX; STOP 


PERFORM 
SPLIT  PAGE 
ROUTINE 
FOR  P(s) 


ISP(s) 

FULL? 


TREE  IS  EMPTY; 
CREATE  ROOT 
PAGE  WITH  V 


Figure  3 - Insertion  Algorithm 
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